Towards Domain-Independent Deep Linguistic Processing: Ensuring Portability and Re-Usability of Lexicalised Grammars
نویسندگان
چکیده
In this paper we illustrate and underline the importance of making detailed linguistic information a central part of the process of automatic acquisition of large-scale lexicons as a means for enhancing robustness and at the same time ensuring maintainability and re-usability of deep lexicalised grammars. Using the error mining techniques proposed in (van Noord, 2004) we show very convincingly that the main hindrance to portability of deep lexicalised grammars to domains other than the ones originally developed in, as well as to robustness of systems using such grammars is low lexical coverage. To this effect, we develop linguistically-driven methods that use detailed morphosyntactic information to automatically enhance the performance of deep lexicalised grammars maintaining at the same time their usually already achieved high linguistic quality.
منابع مشابه
Multilingual Question Answering With High Portability On Relational Databases
This paper describes a highly-portable multilingual question answering system on multiple relational databases. We apply semantic category and pattern-based grammars, into natural language interfaces to relational databases. Lexico-semantic pattern (LSP) and multi-level grammars achieve portability of languages, domains, and DBMSs. The LSP-based linguistic processing does not require deep analy...
متن کاملXMG: a Multi-formalism Metagrammatical Framework
In this paper we introduce XMG (eXtensible MetaGrammar), a system dedicated to the production of wide coverage lexicalised grammars. In particular, we show that XMG provides a representation language suitable for describing different linguistic dimensions and different grammatical formalisms. Furthermore, we briefly sketch the architecture of the XMG compiler showing that it encodes a theoretic...
متن کاملBaldwin, Timothy (2007) Scalable Deep Linguistic Processing: Mind the Lexical Gap, In Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation (PACLIC21), Seoul, Korea, pp. 3-12
Coverage has been a constant thorn in the side of deployed deep linguistic processing applications, largely because of the difficulty in constructing, maintaining and domain-tuning the complex lexicons that they rely on. This paper reviews various strands of research on deep lexical acquisition (DLA), i.e. the (semi-)automatic creation of linguistically-rich language resources, particularly fro...
متن کاملTrailfinder - A Case Study in Extracting Spatial Information Using Deep Language Processing
The present paper reports on an end-to-end application using a deep processing grammar to extract spatial and temporal information of prepositional and adverbial expressions from running text. The extraction process is based on the full understanding of the input text. It is represented in a formalism standard for unification-based grammars and with a language-independent vocabulary as far as s...
متن کاملWhat grammars tell us about corpora : the case of reduced relative clausesPaola
We present a large (65 million words of Wall Street Journal) and in-depth corpus study of a particular syntactic ambiguity to investigate (1) to what extent the structure of a grammar is reeected in a corpus, and (2) how probability functions deened according to a grammar t independently established measures of syntactic disambiguation preference. We look at the well-known case of the ambiguity...
متن کامل